The following data set is viewing the state of happiness in 166 countries on different continents of the world from 2005 to 2020 and shows how the new science of happiness explains personal and national variations in happiness. The dataset is retrieved from Kaggle. The happiness scores (Life Ladder) and rankings use data from the Gallup World Poll. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. Besides the common six factors used to evaluate the happiness, tow more factors, positive affect and negative affect, were added as extra metrics.
The goal of the analysis is to get a better understanding of the distribution of happiness scores in the countries by geography, what the correlation of happiness score with the most common six main metrics is, as well as the difference of the correlation between countries with high scores and countries with low scores.
The original data was grouped by country names from year 2005 to 2020. In order to analyze the happiness for countries on each continent, a new variable ‘Continent’ was added so that each country was categorized by the geographical regions. The happiness score for each country was represent by the mean of the happiness scores of the past 15 years (NA values were ignored).
What is the happiness scores for a random country? Below is a box plot for countries on each continent. It is easy to tell the possible range of the happiness score once the country’s geographical region is known.
As shown above, countries in Africa have the lowest average happiness score, the country having the highest happiness score is in Europe, Oceanian countries have the highest average happiness score (the result is not accurate since the data only includes two most advanced countries in this region). The distribution of happiness scores in South American countries has the narrowest spread.
As shown above, in top 5o countries ranking in happiness score, 42% percent are from Europe, no African countries in top 50 list, the second place is Asia, 22% of the top 50 countries are from this region, followed are North and South America, they contributes the same number of countries to the top 50 list, unsurprisingly, the two countries from Oceania are also included. While in the bottom 50 countries, three quarters of the countries are from Africa, followed by is Asia in which 9 countries are involved, making up to 18% of the list, Oeanian and South American countries are absent in the bottom 50 list.
As expected, most of the countries in Africa are in the bottom 50 list, accounting for 75% of the countries on this continent. Although Europe has the largest number of countries in top 50 list, South America has the highest percentage (Oceania is excluded for its insufficient data). The result suggests that Africa is lesst happy continent, on the contrary, South America is the happiest continent on average.
As shown above, countries on all continents experienced a decline in happiness in last 15 years, a collapse happened to all continents except Oceania between 2005 and 2006. Then the happiness score started to recover slowly with fluctuations. Oceanian countries had a subtle fluctuation in this process, happiness scores for South American countries started to continuously decline from 2019 after a short term of recovery. On the contrary, happiness scores for countries in Europe kept growing steadily since 2009.
The report of world happiness is highly recognized as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Therefore, it is essential to know the correlations between happiness score and the most commonly used factors for measurement to get insight into how the factors impact happiness.
## Life.Ladder Log.GDP.per.capita Social.support
## Life.Ladder 1.0000000 0.8402260 0.76807612
## Log.GDP.per.capita 0.8402260 1.0000000 0.77283422
## Social.support 0.7680761 0.7728342 1.00000000
## Healthy.life.expectancy.at.birth 0.8017737 0.8531780 0.70625668
## Freedom.to.make.life.choices 0.6225599 0.4753858 0.43449505
## Generosity 0.1106616 -0.1044322 -0.02324931
## Perceptions.of.corruption -0.3737247 -0.2744379 -0.14640735
## Healthy.life.expectancy.at.birth
## Life.Ladder 0.80177369
## Log.GDP.per.capita 0.85317801
## Social.support 0.70625668
## Healthy.life.expectancy.at.birth 1.00000000
## Freedom.to.make.life.choices 0.41975361
## Generosity -0.02352294
## Perceptions.of.corruption -0.24089306
## Freedom.to.make.life.choices Generosity
## Life.Ladder 0.6225599 0.11066156
## Log.GDP.per.capita 0.4753858 -0.10443218
## Social.support 0.4344950 -0.02324931
## Healthy.life.expectancy.at.birth 0.4197536 -0.02352294
## Freedom.to.make.life.choices 1.0000000 0.27231124
## Generosity 0.2723112 1.00000000
## Perceptions.of.corruption -0.5162586 -0.25940108
## Perceptions.of.corruption
## Life.Ladder -0.3737247
## Log.GDP.per.capita -0.2744379
## Social.support -0.1464074
## Healthy.life.expectancy.at.birth -0.2408931
## Freedom.to.make.life.choices -0.5162586
## Generosity -0.2594011
## Perceptions.of.corruption 1.0000000
As shown above, there are positive correlations between happiness score and Log.GDP.per.capita, Social.support, Healthy.life.expectancy.at.birth, and Freedom.to.make.life.choices respectively, in which the correlations between happiness score and Log.GDP.per.capita are the strongest. The result suggests that happiness is closely associated with income, social support, health, and freedom, and income and health are the most important factors for happiness.
As shown above, the distributions of happiness score, per capital GDP and health life expectancy at birth have similar pattern. The result is consistent with the result from the pairs plot, indirectly proving that happiness is closely associated with income and health.
The central limit theorem states that when sufficiently large random samples are taken from a population with mean (μ) and standard deviation (σ) replaceable, the distribution of the sample means will tends towards a normal distribution even if the original variables themselves are not normally distributed. As suggested by the above plot, the Log.GDP.per.capita variable has high correlation with happiness score, distribution of this variable can reflect happiness score distribution to some extend.
As displayed in above histogram, the attribute of Log.GDP.per.capita has a left-skewed distribution. The variable Log.GDP.per.capita will be used as an example to show the application of the central limit theorem. Below histograms showing the sample means of 1000 random samples of sample size 10, 20, 30, and 40 follow a normal distribution.
## Data mean: 9.2767 Data sd: 1.186256
##
## Sample Size = 10 Mean = 9.28474 SD = 0.390164
## Sample Size = 20 Mean = 9.272381 SD = 0.2863155
## Sample Size = 30 Mean = 9.271196 SD = 0.225088
## Sample Size = 40 Mean = 9.283652 SD = 0.195892
As shown above, Compared to mean and standard deviations of the data from, means of the four samples are almost the same, while the standard deviations decrease with the increase in the sample sizes. These changes also reflect on plots that increase in sample sizes results in the distribution becoming less skewed and slowly approaching the shape of a normal distribution. As the sample size increases, the spread of the distribution becomes narrower.
Sampling is the selection of a subset of individuals from a statistical population to estimate trends or patterns of the whole population via that can be seen in the subset. There are a variety of probability sampling methods that can be applied to the data. The sampling method used for this analysis are simple random sampling without replacement, systemic, and stratified. The sampling was specifically emphasize the geographical region of the countries in the happiness score. In simple random sampling, each country in the group has the same probability to be selected. The sample size is set to be 40. There will be 40 randomly selected without replacement out of the population of 166. In systematic sampling, the frame is divided into 40 (number of size) groups without bias. The first sample is selected randomly from the first group, then the remaining 39 samples are selected by every 40th sample from the frame. Stratified sampling is applied when the larger group of data is broken into smaller groups based on some common characteristic and then certain sizes are picked from each group. In this analysis, the countries are broken into 6 subgroups according to the continent they are located on. Samples withdrew from each group is determined by the srswor method.
## [1] 12 10 11 4 0 3
## Stratum 1
##
## Population total and number of selected units: 43 12
## Stratum 2
##
## Population total and number of selected units: 45 10
## Stratum 3
##
## Population total and number of selected units: 49 11
## Stratum 4
##
## Population total and number of selected units: 12 4
## Stratum 5
##
## Population total and number of selected units: 15 3
## Number of strata 5
## Total number of selected units 40
As shown above, the results of the SRSWOR and systematic samplings are closest to the mean of data. Some deviations from the mean of data in Africa and North America groups was observed from the result of stratified sampling, indicating that these groups might need further divisions into more subgroups based on appropriate characteristics.
Throughout the analysis, it has been stated that the happiness score distribution varies in different continent, the spread of happiness score distribution is different in different continents, suggesting that uneven development exists between different continents and between different countries in the same continent as well. In a nutshell, by continent, Africa has the lowest average happiness score, North America is the happiest continent on average, Europe has the happiest country in the world, but lag of happiness between different countries in South America is narrowest. Among the six common factors used to measure happiness, economy and health are closest to happiness. Whereas, the correlation is not applicable to every group. To further understand the relationship between happiness and these factors to inform policy-making decisions, continent should be divided into smaller regions by their common characteristics, and data for more countries should be included.